❄️ PSYCH 413 Final Exam ❄️

Due Friday Dec 19th by 08:30 a.m.

Instructions (Ignore at your own risk)

General

  • Please submit your responses as a nicely formatted notebook (.ipynb) file.
  • If your answer includes many decimal places (e.g., 3.14159265358979), please round it to a reasonable number of decimals for readability (typically 3 to 4).
  • For ease of marking, avoid showing the outputs of unnecessary values.
  • Make sure your code runs without errors and shows all required outputs.
  • Good coding style matters— clean, organized, and well-commented code will be rewarded. Disorganized, redundant, or poorly structured code may lose marks.

Requirements (IMPORTANT):

  • Unless otherwise specified, set any trimming you need to do at 20% and use an \(\alpha = 0.05\).
  • Unless the instructions explicitly state otherwise (e.g., “assume the data are normally distributed”), you are responsible for checking whether the assumptions of the method are reasonable and robust approaches need to be used.
    • If classical test assumptions are not violated, use the classical test.
  • Do not remove outliers unless explicitly asked to.
  • Respect the principle of marginality.

Plots, Packages, and Functions:

  • All plots must be made with ggplot2 and include clear, descriptive axis titles rather than default column names.
  • Only the following packages are permitted:
    • tidyverse, WRS2, car
  • Unless stated otherwise, The following functions are not permitted:
    • IQR, quantile(), mad(), t.test(), yuen(), trimse(), anova(), aov(), chisq.test()

Question 1

Many people mistakenly believe that behavioural change following punishment depends only on the severity of a consequence. In reality, the effectiveness of punishment procedures depends on more nuanced factors, including the degree of correlation between the response and the punisher (known as contingency) and the delay between the response and the punisher (known as contiguity).

The dataset phone_use_response_cost.csv examines the effects of contingency and delay on phone-use behaviour in students undergoing a classroom behaviour program. Each time a student used their phone during classroom hours, the student sometimes lost cellular and Wi-Fi access on their device for 1 hour (a negative punisher, because access to a valued resource was removed).

Independent Variables:

Punishment contingency was manipulated at four levels:

  • 1.0 = High contingency (loss of connectivity always followed phone use)
  • 0.75 = Medium contingency (loss of connectivity usually followed phone use)
  • 0.25 = Low contingency (loss of connectivity sometimes followed phone use)
  • 0 = No contingency (loss of connectivity was unrelated/unpredictable to phone use)

Importantly, all students experienced the same total number of one-hour connectivity losses across conditions; only the degree to which loss of connectivity was contingent on phone use varied.

Punishment delay was manipulated at four levels:

  • 0 = Immediate (connectivity was removed immediately when the phone was used)
  • 30s = Short delay (connectivity was removed 30 seconds after phone use)
  • 60s = Moderate delay (connectivity was removed 60 seconds after phone use)
  • 300s = Long delay (connectivity was removed 5 minutes after phone use)

Dependent Variable

The effectiveness of the program was measured using a suppression ratio, reflecting the extent to which students reduced their phone use during class relative to baseline. Lower suppression ratios indicate greater suppression of phone use (i.e., more effective punishment). Suppression ratios around 0.5 indicate no suppression.

Assume the data satisfy all the necessary assumptions for a classic 4 × 4 independent ANOVA.

Task

Create a professional-looking interaction plot for suppression ratio as a function of punishment contingency and punishment delay, including 95% confidence intervals.

Display a data frame showing the group means and corresponding confidence-interval boundaries on your plot.

Based on the plot, does the pattern of results suggest an interaction between punishment contingency and punishment delay?

library(tidyverse)

# Load data
phone <- read_csv("data/phone_use_response_cost.csv")

# Factor columns and organize
phone$contingency <- factor(phone$contingency)
phone$delay <- factor(phone$delay, levels = c("0s", "30s", "60s", "300s"))

# Get stats
plot_data <- phone |>
  group_by(contingency, delay) |>
  summarise(
    n = length(supp_rat),
    m = mean(supp_rat),
    se = sd(supp_rat) / sqrt(n),
    df = n - 1,
    alpha = 0.05,
    t_crit = abs(qt(alpha / 2, df = df)),
    low_ci = m - t_crit * se,
    top_ci = m + t_crit * se
  )

plot_data |> 
  select(contingency, delay, m, low_ci, top_ci)
# A tibble: 16 × 5
# Groups:   contingency [4]
   contingency delay          m     low_ci     top_ci
   <fct>       <fct>      <dbl>      <dbl>      <dbl>
 1 0           0s    0.1543872  0.1288664  0.1799080 
 2 0           30s   0.2243884  0.1960464  0.2527303 
 3 0           60s   0.2972303  0.2654620  0.3289986 
 4 0           300s  0.5020985  0.4893162  0.5148808 
 5 0.25        0s    0.1317080  0.1081301  0.1552860 
 6 0.25        30s   0.2044674  0.1827565  0.2261782 
 7 0.25        60s   0.2204094  0.1899424  0.2508764 
 8 0.25        300s  0.4577217  0.4486596  0.4667838 
 9 0.75        0s    0.06474551 0.04354108 0.08594995
10 0.75        30s   0.1321713  0.1092946  0.1550480 
11 0.75        60s   0.1399080  0.1100236  0.1697923 
12 0.75        300s  0.4051977  0.3807626  0.4296327 
13 1           0s    0.04904963 0.03368443 0.06441482
14 1           30s   0.09366966 0.07786461 0.1094747 
15 1           60s   0.1111369  0.09545035 0.1268234 
16 1           300s  0.2041352  0.1778866  0.2303838 
# Plot
palette <- palette.colors(palette = "Dark 2")

ggplot(plot_data, aes(
  x = delay, y = m,
  group = contingency,
  colour = contingency,
  shape = contingency,
  fill = contingency
)) +
  geom_line(position = position_dodge(width = 0.3), linewidth = 1) +
  geom_point(position = position_dodge(width = 0.3), size = 3.5) +
  geom_errorbar(aes(ymin = low_ci, ymax = top_ci),
    width = 0.25,
    linewidth = 1,
    position = position_dodge(width = 0.3)
  ) +
  coord_cartesian(ylim = c(0, 0.5)) +
  scale_color_manual(values = palette) +
  scale_fill_manual(values = palette) +
  scale_shape_manual(values = 21:24) +
  xlab("Delay") +
  ylab("Supression Ratio") +
  labs(shape = "Contingency", colour = "Contingency", fill = "Contingency") +
  theme(
    axis.text = element_text(colour = "black", size = 12),
    axis.title = element_text(colour = "black", size = 14)
  )

The four lines are not parallel, suggesting a possible interaction.

Note

Contingency could have been put on the x-axis instead.


Question 2

Using the phone_use_response_cost.csv data, fit a classic ANOVA model that predicts suppression ratio from both contingency and delay, including their interaction (i.e., a classic 4 × 4 factorial ANOVA).

The researchers were interested in a set of ordered, theory-driven contrasts for each independent variable, reflecting progressively finer distinctions in the effectiveness of their punishment method.

Contingency:

  1. Maximal contingency hypothesis: Whether perfect contingency (1.00) produced greater suppression of phone use than the average of all lower contingency conditions (0.75, 0.25, 0).

  2. Moderate contingency hypothesis: Whether high but imperfect contingency (0.75) produced greater suppression than the average of weak or absent contingency conditions (0.25, 0).

  3. Minimal contingency hypothesis: Whether weak contingency (0.25) produced greater suppression than no contingency (0).

Delay:

  1. Immediate punishment hypothesis: Whether immediate punishment (0 s) produced greater suppression of phone use than the average of all delayed punishment conditions (30, 60, 300 s).

  2. Short vs long delay hypothesis: Whether a short delay (30 s) produced greater suppression than the average of longer delays (60 s, 300 s).

  3. Long-delay differentiation hypothesis: Whether moderate (60 s) and long (300 s) delays differed in their effectiveness.

When creating your model (with interactions), set up custom contrasts that evaluate these hypotheses.

For each main effect and interaction, report the F-ratio, degrees of freedom, and p-value. For this question, all functions from allowed packages are permitted.

# Contingency factor order
levels(phone$contingency)
[1] "0"    "0.25" "0.75" "1"   
# Delay factor order
levels(phone$delay)
[1] "0s"   "30s"  "60s"  "300s"
# Contrasts for contingency (3)
max_cont <- c(-1, -1, -1, 3)
mod_cont <- c(-1, -1, 2, 0)
min_cont <- c(-1, 1, 0, 0)
contrasts(phone$contingency) <- cbind(max_cont, mod_cont, min_cont)

# Contrasts for delay (3)
immediate <- c(3, -1, -1, -1)
short <- c(0, 2, -1, -1)
long <- c(0, 0, 1, -1)
contrasts(phone$delay) <- cbind(immediate, short, long)

# Model
model <- lm(supp_rat ~ contingency + delay + contingency:delay, data = phone)

# Is data balanced? (nope)
table(phone$contingency, phone$delay)
      
       0s 30s 60s 300s
  0    15  14  15   14
  0.25 15  15  14   15
  0.75 14  14  15   15
  1    14  13  15   15
# Type II Sum of Squares (to respect principle of marginality)
library(car)
effects <- Anova(model, type = "II")
effects
Anova Table (Type II tests)

Response: supp_rat
                   Sum Sq  Df F value    Pr(>F)    
contingency       1.09854   3 215.862 < 2.2e-16 ***
delay             2.79960   3 550.122 < 2.2e-16 ***
contingency:delay 0.25103   9  16.442 < 2.2e-16 ***
Residuals         0.36641 216                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Question 3

Calculate an \(\omega^2\) effect size for the ANOVA’s effects. Do not round your outputs for this question, display as much precision as possible.

Given this and the previous question, describe your findings of the main effects and interaction.

# Extracting values from ANOVA table
MS_r <- effects$`Sum Sq`[4] / effects$Df[4]
SST <- sum(effects$`Sum Sq`)

# Contingency, strain, interaction, omega^2 values
(effects$`Sum Sq`[1:3] - effects$Df[1:3] * MS_r) / (SST + MS_r)
[1] 0.24205888 0.61862853 0.05219055

Contingency had a large, significant effect on suppression ratio. Delay produced a large, significant effect. Lastly, there was a small, yet significant interaction between contingency and delay.


Question 4

Are all the contrasts you created centered and orthogonal? Demonstrate your answer with R code, and display the table(s) of contrast values/weights used.

Tables of Contrasts:

contrasts(phone$contingency)
     max_cont mod_cont min_cont
0          -1       -1       -1
0.25       -1       -1        1
0.75       -1        2        0
1           3        0        0
contrasts(phone$delay)
     immediate short long
0s           3     0    0
30s         -1     2    0
60s         -1    -1    1
300s        -1    -1   -1

Centering:

colSums(contrasts(phone$contingency))
max_cont mod_cont min_cont 
       0        0        0 
colSums(contrasts(phone$delay))
immediate     short      long 
        0         0         0 

Orthogonality:

t(contrasts(phone$contingency)) %*% contrasts(phone$contingency)
         max_cont mod_cont min_cont
max_cont       12        0        0
mod_cont        0        6        0
min_cont        0        0        2
t(contrasts(phone$delay)) %*% contrasts(phone$delay)
          immediate short long
immediate        12     0    0
short             0     6    0
long              0     0    2


Question 5

Display the results table for your contrasts. Then, for each of the six hypotheses (main effects contrasts), state the conclusion you draw from these in plain English (i.e., without using numerical results).

# Contrast results (rounding for readability)
results <- round(summary(model)$coefficients, 3)
results
                                   Estimate Std. Error t value Pr(>|t|)
(Intercept)                           0.212      0.003  78.338    0.000
contingencymax_cont                  -0.033      0.002 -20.672    0.000
contingencymod_cont                  -0.030      0.002 -13.380    0.000
contingencymin_cont                  -0.020      0.004  -5.374    0.000
delayimmediate                       -0.037      0.002 -23.908    0.000
delayshort                           -0.043      0.002 -19.216    0.000
delaylong                            -0.100      0.004 -26.378    0.000
contingencymax_cont:delayimmediate    0.005      0.001   5.686    0.000
contingencymod_cont:delayimmediate    0.001      0.001   0.889    0.375
contingencymin_cont:delayimmediate    0.003      0.002   1.397    0.164
contingencymax_cont:delayshort        0.007      0.001   5.493    0.000
contingencymod_cont:delayshort        0.002      0.002   0.893    0.373
contingencymin_cont:delayshort        0.007      0.003   2.170    0.031
contingencymax_cont:delaylong         0.018      0.002   8.200    0.000
contingencymod_cont:delaylong        -0.007      0.003  -2.385    0.018
contingencymin_cont:delaylong        -0.008      0.005  -1.499    0.135

All six comparisons were significant.

Maximal contingency hypothesis:

  • Perfect contingency suppresses phone use more than all weaker or absent contingencies combined.

Moderate contingency hypothesis:

  • High but imperfect contingency suppresses phone use more than weak or absent contingency.

Minimal contingency hypothesis:

  • Even a weak contingency suppresses phone use more than no contingency at all.

Immediate punishment hypothesis:

  • Immediate punishment suppresses phone use more than punishment delivered after any delay.

Short vs long delay hypothesis:

  • Punishment delivered after a short delay suppresses phone use more than punishment delivered after longer delays.

Long-delay differentiation hypothesis:

  • Punishment delivered after a moderate delay suppresses phone use more than punishment delivered after a very long delay.
Note

For full marks you needed to indicate the direction of the change, not just whether there was a change in suppression.


Question 6

Considering the potential interactions, does a perfect contingency provide an advantage over weaker contingencies when the delay increases from moderate to long?

Report/extract the following:

  • Test-statistic
  • Degrees of freedom
  • P-value
  • Partial \(\eta^2\)
# Note: this specific test is found in row 14 of the coefficient table
t_stat <- results[14, 3]
df <- effects$Df[4]

# Eta^2
eta_sq <- t_stat^2 / (t_stat^2 + df)

# Results
paste0("Test Stat = ", round(t_stat, 5))
paste0("DF = ", df)
paste0("p = ", results[14 ,4])
paste0("Partial eta^2 = ", round(eta_sq, 5))
[1] "Test Stat = 8.2"
[1] "DF = 216"
[1] "p = 0"
[1] "Partial eta^2 = 0.2374"

The difference between perfect contingency and weaker contingencies is not the same at 60 seconds as it is at 300 seconds. Specifically, moving from a moderate to long delay reduces the punishing effect at weaker (non-perfect) contingencies.




Question 7

A blogger named Josh Madison tabulated the amount of each M&M colour in a variety of M&M packages. The data can be found in MM_Madison.csv. Calculate a p-value to determine whether the data provide evidence that the colours are equally represented overall, or are some colours systematically more common than others across packages? Assume that the amount of each colour in a package is independent of the other colours.


library(tidyverse)

# Load data and convert to wide format
mm <- read_csv("data/MM_Madison.csv") |>
  pivot_wider(names_from = colour, values_from = amount) |>
  select(-c(pkg, weight_oz, year))

# Calculate sums (observed frequencies)
obs <- colSums(mm)

# Expected frequencies
N <- sum(obs)
exp <- rep(N / 6, 6)

# Chi-Sq test stat
chi_sq <- sum(((obs - exp)^2) / exp)

# Degrees of freedom
df <- 6 - 1

# P-value
chi_sq
[1] 65.7374
pchisq(chi_sq, df = df, lower.tail = FALSE)
[1] 7.879637e-13


Question 8

Assuming you calculated the previous question correctly, the test you conducted has a power of 0.98 to detect a small effect.

What does this imply about how the result of the hypothesis test should be interpreted? (You do not need to perform any additional calculations.)

The high amount of power means that, if there is a true difference in colours, we can be very confident that it is being detected. However, power above 0.9 is often more than necessary and allows even trivial departures from the null hypothesis to be detected (i.e., it may be a true difference, but not one large enough to care about). Consequently, p-values should be interpreted alongside effect sizes to guage the practical importance of the difference.

Other points you could potentially make
  • Waste of resources.
  • Ethically questionable (though not in this case).
  • Test ceases to be a test (it’s a guarantee).




Question 9

Holiday traditions often come with strong opinions (especially about candy). Some people swear that having a little holiday candy nearby makes stressful moments feel more manageable, while others think it’s just sugar and wishful thinking.

A local psychologist wanted to test whether having holiday candy available during a stressful task affects how anxious people feel.

To test this, participants were asked to complete a timed puzzle task in a softly lit room while a loop of chaotic “holiday mall music” played in the background (to create a mildly stressful, seasonally-themed atmosphere).

In one condition, a small plate of holiday sweets was beside them and they were told they could look at it but not eat it during the task. In the other condition, no sweets were present.

During each session, the psychologist measured mean heart rate (in beats per minute) using a chest-strap monitor. Higher heart rate is presumed to indicate greater physiological arousal/anxiety.

The data file holiday_candy_hr.csv contains the following columns:

  • id — Unique random ID for each participant.
  • condition — “Candy Present” or “No Candy”.
  • hr_bpm — Heart rate (beats per minute) recorded during the task.

The researcher hypothesizes that the “No Candy” condition will produce higher levels of anxiety (i.e., higher heart rate) than the “Candy Present” condition. Conduct an appropriate statistical test to evaluate whether the data provide sufficient evidence to support the researcher’s hypothesis.

Report all of the following:

  • Null and alternative hypothesis
  • Test statistic
  • Degrees of freedom
  • p-value
  • 95% confidence interval
# Load data (note data file's rows are out of order)
hr <- read_csv("data/holiday_candy_hr.csv") |>
  arrange(id)

head(hr) # look at first 6 rows (observations are PAIRED)
# A tibble: 6 × 3
     id condition   hr_bpm
  <dbl> <chr>        <dbl>
1     1 No Candy  84.53742
2     1 Candy     73.57114
3     2 Candy     88.03545
4     2 No Candy  70.87695
5     3 No Candy  93.24787
6     3 Candy     62.31203
# Check normality of difference scores
diff_hr <-
  hr$hr_bpm[hr$condition == "No Candy"] - hr$hr_bpm[hr$condition == "Candy"]

ggplot(mapping = aes(sample = diff_hr)) +
  stat_qq() +
  stat_qq_line() +
  labs(x = "Theoretical Quantiles", y = "Sample Quantiles")

The normality assumption is not reasonable. A (one-sample / paired) trimmed t-test needs to be used on the difference scores.

\(H_0: \mu_{_\text{No Candy}} - \mu_{_\text{Candy}} \leq 0\)

\(H_1: \mu_{_\text{No Candy}} - \mu_{_\text{Candy}} > 0\)

Note

To support the researcher’s hypothesis, it needs to be \(H_1\).

library(WRS2)
# Note: difference scores were calculated above

# Basic stats
N <- nrow(hr) / 2 # of pairs
G <- 0.2
m_diff <- mean(diff_hr, tr = G)
s <- sqrt(winvar(diff_hr, tr = G))
se <- s / ((1 - 2 * G) * sqrt(N))
h <- N - 2 * floor(G * N)
df = h - 1

# Test stat
mu_diff <- 0
t_stat <- (m_diff - mu_diff) / se

# P-value
p <- pt(t_stat, df = df, lower.tail = FALSE)

# Confidence interval
alpha <- 0.05
t_crit <- abs(qt(alpha, df = df))

low_ci <- m_diff - t_crit * se
top_ci <- m_diff + Inf * se

# Results
paste0("Test Stat = ", round(t_stat, 5))
paste0("df = ", round(df, 5))
paste0("p = ", round(p, 5))
paste0("[", round(low_ci, 3), ", ", round(top_ci, 3), "]")
[1] "Test Stat = 2.65658"
[1] "df = 47"
[1] "p = 0.00537"
[1] "[1.202, Inf]"


Question 10

Create an appropriate bar plot to visualize the results from the previous question. Include two-sided 95% confidence intervals for each bar. Additionally, display a data frame containing the summary statistics shown in the plot.

Note

The data is paired and non-normal. The plot needs to account for those facts.

# Add difference scores to data
diff_df <- data.frame(
  id = 1:N,
  condition = "Difference",
  hr_bpm = diff_hr
)

hr <- rbind(hr, diff_df)

# Factoring to adjust condition ordering
hr$condition <-
  factor(hr$condition, levels = c("No Candy", "Candy", "Difference"))

# Get stats for plot
plot_data <- hr |>
  group_by(condition) |>
  summarise(
    n = length(hr_bpm),
    G = 0.2,
    m = mean(hr_bpm, tr = G),
    s = sqrt(winvar(hr_bpm, tr = G)),
    se = s / ((1 - 2 * G) * sqrt(n)),
    h = n - 2 * floor(G * n),
    df = h - 1,
    alpha = 0.05,
    t_crit = abs(qt(alpha / 2, df = df)),
    low_ci = m - t_crit * se,
    top_ci = m + t_crit * se
  )

# Data frame
plot_data |>
  select(condition, m, low_ci, top_ci)
# A tibble: 3 × 4
  condition          m     low_ci    top_ci
  <fct>          <dbl>      <dbl>     <dbl>
1 No Candy   79.87594  77.85341   81.89848 
2 Candy      76.00632  74.32401   77.68863 
3 Difference  3.261754  0.7917384  5.731770
# Plot
ggplot(plot_data, aes(x = condition, y = m)) +
  geom_bar(
    stat = "identity",
    colour = "black",
    fill = "#BB171A"
  ) +
  geom_errorbar(
    aes(ymin = low_ci, ymax = top_ci),
    width = 0.25
  ) +
  xlab("Condition") +
  ylab("Heart Rate (bpm; 20% Trimming)")




Question 11

Over the years, several Alpine villages have reported a chilling pattern of wintertime disturbances during the Advent season. Local folklore attributes these events to Krampus, a legendary horned figure said to justly punish the wicked in the weeks leading up to Christmas. According to tradition, Krampus is most active during cold, dark nights, particularly when supernatural conditions are said to be strongest.

A group of cultural anthropologists and behavioural scientists has compiled a historical dataset of reported Krampus-related incidents from village records spanning 1890–2024. The dataset, krampus_incidents.csv, contains the following variables:

  • incident_severity: a continuous measure of severity of the incident developed by the researchers. It is on a scale of 0–100, where 100 indicates extreme harm or disruption.
  • snow_density: a continuous measure of snowfall intensity on the night of the incident (in % opacity, 0–100).
  • guard_exp_lvl: experience level of any village night guard present:
    • “None” (no guard present),
    • “Novice” (inexperienced guard),
    • “Experienced” (seasoned guard).
  • krampusnacht: categorical variable (Yes = 1, No = 0) indicating whether the incident occurred on Krampusnacht (December 5th), traditionally believed to be the peak of Krampus activity.

The researchers hypothesize that harsh environmental conditions (e.g., heavy snowfall) may increase the severity of incidents, particularly on Krampusnacht. They are also interested in whether the presence and experience level of village guards moderates these effects.

Your Task

Plot incident severity as a function of snow density, separately for incidents occurring on Krampusnacht and not on Krampusnacht.

Display separate OLS regression lines (incident_severity ~ snow_density) for Krampusnacht and non-Krampusnacht nights.

Format the plot to have a professional appearance suitable for inclusion in an academic article (because this is definitely real data you are analyzing).

If necessary, adjust the plot size to ensure optimal display within Google Colab.


There are different approaches you could take for this.

Option 1: Plot facets - space is not used optimally with this method, but it provides the clearest contrast of the two categories and makes it easy to visualize potential trends.

krampus <- read_csv("data/krampus_incidents.csv")

krampus$Dec_5 <-
  factor(krampus$krampusnacht, labels = c("No", "Yes"))

ggplot(krampus, aes(x = snow_density, y = incident_severity)) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", se = FALSE, colour = "#BB171A", linewidth = 1.5) +
  facet_wrap(~Dec_5, nrow = 2) +
  labs(x = "Snow Density", y = "Incident Severity") +
  scale_x_continuous(breaks = seq(0, 100, 20)) +
  scale_y_continuous(breaks = seq(0, 100, 20)) +
  theme(
    axis.text = element_text(colour = "black", size = 12),
    strip.text = element_text(colour = "black", size = 12),
    axis.title = element_text(size = 14)
  )

Option 2: Distinguish the categories via aesthetics (in particular the shape aesthetic so that the plot is robust to colour blindness). Care needs to be taken to make sure “Yes” values are not hidden by overplotting. This is tricky to do effectively and elegantly.

# Arranging values so that "Yes" is plotted on top of "No"
krampus <- krampus |>
  arrange(Dec_5)

ggplot(
  krampus,
  aes(
    x = snow_density,
    y = incident_severity,
    colour = Dec_5,
    fill = Dec_5,
    shape = Dec_5
  )
) +
  geom_point(size = 2.5, stroke = 0.35) +
  scale_shape_manual(values = c(21, 24)) +
  scale_colour_manual(values = c("#0072B2", "#D55E00")) +
  scale_fill_manual(values = c(NA, "#D55E00")) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.5, show.legend = FALSE) +
  labs(
    x = "Snow Density",
    y = "Incident Severity",
    shape = "Krampusnacht",
    colour = "Krampusnacht",
    fill = "Krampusnacht",
    alpha = "Krampusnacht"
  ) +
  scale_x_continuous(breaks = seq(0, 100, 20)) +
  scale_y_continuous(breaks = seq(0, 100, 20)) +
  theme(
    axis.text = element_text(colour = "black", size = 12),
    strip.text = element_text(colour = "black", size = 12),
    axis.title = element_text(size = 14)
  )


Question 12

Using the krampus_incidents.csv data, create the following ordinary least-squares regression models using lm():

  • Model 1:
    • Predict incident severity as a function of snow density.
  • Model 2:
    • Extend Model 1 by adding the guard experiance level.
  • Model 3:
    • Extend Model 2 by adding Krampusnacht.

Use the following coding scheme for the categorical variables:

Guard Experiance:

Level \(X_2\) \(X_3\)
None 0 0
Novice 1 0
Experienced 0 1


Krampusnacht:

Level \(X_4\)
No 0
Yes 1

Report each model’s equation, \(R^2\) value, and interpret the coefficients of the newely added predictor in plain English.

# Factor categories
# Putting in custom order for convenience
krampus$guard_exp_lvl <-
  factor(krampus$guard_exp_lvl, levels = c("None", "Novice", "Experienced"))

# Set dummy coding
# The ifelse() function could also be used for this.
Novice <- c(0, 1, 0)
Experienced <- c(0, 0, 1)
contrasts(krampus$guard_exp_lvl) <- cbind(Novice, Experienced)

Yes <- c(0, 1)
contrasts(krampus$Dec_5) <- cbind(Yes)

# Models
mod1 <- lm(incident_severity ~ snow_density, data = krampus)
mod2 <- lm(incident_severity ~ snow_density + guard_exp_lvl, data = krampus)
mod3 <- lm(
  incident_severity ~ snow_density + guard_exp_lvl + Dec_5,
  data = krampus
)

mod1sum <- summary(mod1)
mod2sum <- summary(mod2)
mod3sum <- summary(mod3)

Model 1

\(\hat{y} = 16.369 + 0.563 x\)

\(R^2 = 0.381\)

  • For every one percentage point increase in snow density, the incident severity increases by 0.563 points on average.

Model 2

\(\hat{y} = 25.1 + 0.558 x_1 + (-5.823) x_2 + (-15.392) x_3\)

\(R^2 = 0.491\)

  • The presence of an novice guard reduces the the incident severity by 5.82 points.

  • The presence of a experienced guard reduces the the incident severity by 15.39 points.

Model 3

\(\hat{y} = 26.884 + 0.522 x_1 + (-6.381) x_2 + (-15.671) x_3 + 26.363 x_4\)

\(R^2 = 0.549\)

  • Incident severity is increased by 26.36 points when it is Krampusnacht.


Question 13

Is there reason to think model 3 should be preferred over model 1? Conduct a test to evaluate this.

# F-test
N <- nrow(krampus)
p_modB <- 4
p_diff <- 4 - 1
Rsq_diff <- mod3sum$r.squared - mod1sum$r.squared

F_stat <- ((N - p_modB - 1) / p_diff) *
  (Rsq_diff / (1 - mod3sum$r.squared))

#p-val
p <- pf(F_stat, df1 = p_diff, df2 = N - p_modB - 1, lower.tail = FALSE)

# Results
paste0("F = ", round(F_stat, 5))
paste0("DF1 = ", p_diff, " DF2 = ", N - p_modB - 1)
paste0("p = ", round(p, 5))
[1] "F = 110.46955"
[1] "DF1 = 3 DF2 = 895"
[1] "p = 0"

Model 3 accounts for a statistically significant additional proportion of the variance (17%) and is therefore the preferred model.


Question 14

Refit model 3 using ordinary least squares (lm()), after identifying and removing potential outliers with a robust outlier-detection method. List all observations that were removed, and report the fitted regression equation and its \(R^2\) value.

# Note: Outliers should only be removed based on continuous independent/predictor variables
# (i.e., snow density).

# MADN stat
madn <-
  median(abs(krampus$snow_density - median(krampus$snow_density))) / 0.6745

# Detect outliers
krampus$outliers <-
  (abs(krampus$snow_density - median(krampus$snow_density)) / madn) > 2.24

# Outlier list
krampus |> filter(outliers == TRUE)
# A tibble: 3 × 8
   year date       incident_severity snow_density guard_exp_lvl krampusnacht
  <dbl> <date>                 <dbl>        <dbl> <fct>                <dbl>
1  1922 1922-11-28               1.8          7   Novice                   0
2  2019 2019-12-24              19.4         12.2 Novice                   0
3  2022 2023-01-01              23.3         11.3 Novice                   0
# ℹ 2 more variables: Dec_5 <fct>, outliers <lgl>
krampus_cln <- krampus |> filter(outliers != TRUE)

# Refit model 3
mod3 <- lm(
  incident_severity ~ snow_density + guard_exp_lvl + Dec_5,
  data = krampus_cln
)
mod3sum <- summary(mod3)

\(\hat{y} = 27.188 + 0.517 x_1 + (-6.309) x_2 + (-15.681) x_3 + 26.413 x_4\)

\(R^2 = 0.544\)


Question 15

Participants listened to an audio recording of an argument between two coworkers and were then asked: “How intense was the disagreement between the coworkers when they _______ with each other?”

The blank was filled with one of four verbs: “argued,” “disagreed”, “debated,” or “discussed.”

Participants responded by selecting one of three options: “very intense,” “moderately intense,” or “not very intense.”

The results are summarized in the following table:

Very intense Moderately intense Not very intense
Argued 74 36 30
Disagreed 58 60 42
Debated 45 68 47
Discussed 26 44 80

Does the wording of the question (i.e., the verb used) significantly influence participants’ judgments of intensity?

Conduct an appropriate statistical test and report your conclusion, including the test statistic, degrees of freedom, and p-value.

data <- data.frame(
  very = c(74, 58, 45, 26),
  moderate = c(36, 60, 68, 44),
  not_very = c(30, 42, 47, 80)
  )

rownames(data) <- c("argued", "disagreed", "debated", "discussed")

# Get Row, Column Totals, and Overall Total
RowTotal <- apply(data, MARGIN = 1, FUN = sum)
ColTotal <- apply(data, MARGIN = 2, FUN = sum)

#Get Total number of Children
N <- sum(RowTotal)

# Get expected Frequencies
expected <- RowTotal %*% t(ColTotal) / N

#Calculate the test-statistic
chiSq <- sum(((data - expected)^2)/ expected)

#Degrees of Freedom
df <- (4 - 1) * (3 - 1)

#p-value
p <- pchisq(chiSq, df = df, lower.tail = FALSE)

# Results
paste0("Test Stat = ", round(chiSq, 5))
paste0("DF = ", df)
paste0("p = ", round(p, 5))
[1] "Test Stat = 64.63626"
[1] "DF = 6"
[1] "p = 0"

Yes, the manner in which the question is worded does appear to impact their response (i.e., the question wording is not independent of the response they make).